147 research outputs found
Saillance Visuelle, de la 2D à la 3D Stéréoscopique : Examen des Méthodes Psychophysique et Modélisation Computationnelle
Visual attention is one of the most important mechanisms deployed in the human visual system to reduce the amount of information that our brain needs to process. An increasing amount of efforts are being dedicated in the studies of visual attention, particularly in computational modeling of visual attention. In this thesis, we present studies focusing on several aspects of the research of visual attention. Our works can be mainly classified into two parts. The first part concerns ground truths used in the studies related to visual attention ; the second part contains studies related to the modeling of visual attention for Stereoscopic 3D (S-3D) viewing condition. In the first part, our work starts with identifying the reliability of FDM from different eye-tracking databases. Then we quantitatively identify the similarities and difference between fixation density maps and visual importance map, which have been two widely used ground truth for attention-related applications. Next, to solve the problem of lacking ground truth in the community of 3D visual attention modeling, we conduct a binocular eye-tracking experiment to create a new eye-tracking database for S-3D images. In the second part, we start with examining the impact of depth on visual attention in S-3D viewing condition. We firstly introduce a so-called "depth-bias" in the viewing of synthetic S-3D content on planar stereoscopic display. Then, we extend our study from synthetic stimuli to natural content S-3D images. We propose a depth-saliency-based model of 3D visual attention, which relies on depth contrast of the scene. Two different ways of applying depth information in S-3D visual attention model are also compared in our study. Next, we study the difference of center-bias between 2D and S-3D viewing conditions, and further integrate the center-bias with S-3D visual attention modeling. At the end, based on the assumption that visual attention can be used for improving Quality of Experience of 3D-TV when collaborating with blur, we study the influence of blur on depth perception and blur's relationship with binocular disparity.L'attention visuelle est l'un des mécanismes les plus importants mis en oeuvre par le système visuel humain (SVH) afin de réduire la quantité d'information que le cerveau a besoin de traiter pour appréhender le contenu d'une scène. Un nombre croissant de travaux est consacré à l'étude de l'attention visuelle, et en particulier à sa modélisation computationnelle. Dans cette thèse, nous présentons des études portant sur plusieurs aspects de cette recherche. Nos travaux peuvent être classés globalement en deux parties. La première concerne les questions liées à la vérité de terrain utilisée, la seconde est relative à la modélisation de l'attention visuelle dans des conditions de visualisation 3D. Dans la première partie, nous analysons la fiabilité de cartes de densité de fixation issues de différentes bases de données occulométriques. Ensuite, nous identifions quantitativement les similitudes et les différences entre carte de densité de fixation et carte d'importance visuelle, ces deux types de carte étant les vérités de terrain communément utilisées par les applications relatives à l'attention. Puis, pour faire face au manque de vérité de terrain exploitable pour la modélisation de l'attention visuelle 3D, nous procédons à une expérimentation oculométrique binoculaire qui aboutit à la création d'une nouvelle base de données avec des images stéréoscopiques 3D. Dans la seconde partie, nous commençons par examiner l'impact de la profondeur sur l'attention visuelle dans des conditions de visualisation 3D. Nous quantifions d'abord le " biais de profondeur " lié à la visualisation de contenus synthétiques 3D sur écran plat stéréoscopique. Ensuite, nous étendons notre étude avec l'usage d'images 3D au contenu naturel. Nous proposons un modèle de l'attention visuelle 3D basé saillance de profondeur, modèle qui repose sur le contraste de profondeur de la scène. Deux façons différentes d'exploiter l'information de profondeur par notre modèle sont comparées. Ensuite, nous étudions le biais central et les différences qui existent selon que les conditions de visualisation soient 2D ou 3D. Nous intégrons aussi le biais central à notre modèle de l'attention visuelle 3D. Enfin, considérant que l'attention visuelle combinée à une technique de floutage peut améliorer la qualité d'expérience de la TV-3D, nous étudions l'influence de flou sur la perception de la profondeur, et la relation du flou avec la disparité binoculaire
Livrable D1.2 of the PERSEE project : Perceptual-Modelling-Definition-of-the-Models
Livrable D1.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.2 du projet. Son titre : Perceptual-Modelling-Definition-of-the-Model
Livrable D1.2 of the PERSEE project : Perceptual-Modelling-Definition-of-the-Models
Livrable D1.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.2 du projet. Son titre : Perceptual-Modelling-Definition-of-the-Model
Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains
This paper presents Scalable Semantic Transfer (SST), a novel training
paradigm, to explore how to leverage the mutual benefits of the data from
different label domains (i.e. various levels of label granularity) to train a
powerful human parsing network. In practice, two common application scenarios
are addressed, termed universal parsing and dedicated parsing, where the former
aims to learn homogeneous human representations from multiple label domains and
switch predictions by only using different segmentation heads, and the latter
aims to learn a specific domain prediction while distilling the semantic
knowledge from other domains. The proposed SST has the following appealing
benefits: (1) it can capably serve as an effective training scheme to embed
semantic associations of human body parts from multiple label domains into the
human representation learning process; (2) it is an extensible semantic
transfer framework without predetermining the overall relations of multiple
label domains, which allows continuously adding human parsing datasets to
promote the training. (3) the relevant modules are only used for auxiliary
training and can be removed during inference, eliminating the extra reasoning
cost. Experimental results demonstrate SST can effectively achieve promising
universal human parsing performance as well as impressive improvements compared
to its counterparts on three human parsing benchmarks (i.e.,
PASCAL-Person-Part, ATR, and CIHP). Code is available at
https://github.com/yangjie-cv/SST.Comment: Accepted to CVPR2
Saliency detection for stereoscopic images
International audienceSaliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images
Perceptual modelling for 2D and 3D
Livrable D1.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.1 du projet
C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and Generalizable Neural Surface Reconstruction
There is an emerging effort to combine the two popular 3D frameworks using
Multi-View Stereo (MVS) and Neural Implicit Surfaces (NIS) with a specific
focus on the few-shot / sparse view setting. In this paper, we introduce a
novel integration scheme that combines the multi-view stereo with neural signed
distance function representations, which potentially overcomes the limitations
of both methods. MVS uses per-view depth estimation and cross-view fusion to
generate accurate surfaces, while NIS relies on a common coordinate volume.
Based on this strategy, we propose to construct per-view cost frustum for finer
geometry estimation, and then fuse cross-view frustums and estimate the
implicit signed distance functions to tackle artifacts that are due to noise
and holes in the produced surface reconstruction. We further apply a cascade
frustum fusion strategy to effectively captures global-local information and
structural consistency. Finally, we apply cascade sampling and a
pseudo-geometric loss to foster stronger integration between the two
architectures. Extensive experiments demonstrate that our method reconstructs
robust surfaces and outperforms existing state-of-the-art methods.Comment: Accepted by ICCV202
Perceptual Assessment: Final tests and Analysis.
Livrable D6.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D6.3 du projet. Son titre : Perceptual Assessment: Final tests and Analysis
- …